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I. INTRODUCTION 

Dynamical diffraction theory describes the multiple 
elastic scattering of electrons through a crystal. The the- 
ory has been formulated almost a century ago, originally 
for photons by Bragg and von Laue. Later it was ex- 
tended also to electrons on the basis of wave-particle du- 
ality, as proven by the famous experiments of Davidson 
and Germer, and Thomson and Reid. The importance of 
inelastic electron scattering and the wealth of informa- 
tion contained in this process was for the first time ob- 
served by Kikuchi 1 , who observed a complicated pattern 
of lines going beyond the diffraction patterns expected 
for elastic scattering of light or electrons. The diffraction 
theory that included quantitative description of Kikuchi 
patterns was given by Kainuma following qualitative the- 
ories of Shinohara and von Laue^. Various sources of en- 
ergy loss, such as excitations of phonons or valence elec- 
trons to conduction band or core electrons to conduction 
band were discussed by Okamoto et al. 3 

The inelastic electron scattering is typically described 
as a three-step process: 1) an elastic scattering of the 
probe electron from entrance surface to a selected atom 
in the sample, 2) an inelastic event exchanging the mo- 
mentum and energy between a probe and sample elec- 
tron, and 3) an elastic propagation of the scattered probe 
electron towards the exit sample surface. This needs to 
be summed over all possible inelastic scattering centers. 
Such description holds, when there is only one inelastic 
scattering event for each observed probe electron. If its 
mean free path is much longer than the thickness of the 
sample, then it is a good approximation. Otherwise one 
needs to consider multiple inelastic scattering^. 

The inelastic event is described by mixed dynamical 
form-factor (MDFF) introduced by Kohl 6 . For calcula- 
tions of MDFF, there are various levels of sophistication, 
ranging from an isotropic dipole approximation q • q', 
through a parametrized atomic multiplets description 9 , 
configuration interaction^, to a density functional the- 
ory evaluation^. Recently a DFT-based dipole model 
for M DFFs has been published irP using local electronic 



structure properties to set the coefficients in the dipole 
approximation, thereby providing a realistic DFT-based 
MDFF model with efficiency equal to a simple dipole ap- 
proximation (at the cost of losing the fine structure in 
energy dependence). 

The elastic scattering can be simulated by one of the 
two major methods widely used today, th e mu ltislice 
method™^ and Bloch-waves (BW) method 13 - 11 ^. The 
latter one is typically more efficient for periodic struc- 
tures without defects, while the multislice method has 
an advantage when dealing with large non-periodic struc- 
tures or structures with defects. In this article we will 
study the convergence properties of the BW method in 
electron energy-loss near edge structure (ELNES) calcu- 
lations and describe an efficient algorithm for BW sum- 
mation. 



II. BLOCH-WAVES THEORY OF ELNES 

In Re f. [5] we have described our theoretical approach to 
simulate general orientation-sensitive ELNES experiment 
from first principles. For the sake of completeness, we 
present here a brief summary of the key equations. 

The first step is a solution of the secular equation for 
a fast electron of known energy moving in and out of the 
crystal. Expanding the solutions - Bloch waves - into 
plane waves (indexed by reciprocal lattice vectors g, h) 
the secular equation has the following form: 
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where K 2 = Uq + 2meE/h 2 , m and e are, respectively, 
the electron mass and charge, U s = 2meV s /h 2 where V s 
are the Fourier components of the crystal potential, and 
k^) = k + 7^11 are Bloch vectors related to the beam 
direction k and sample surface normal n. 

Solution of the secular equation is a set of Bloch waves 
indexed by j, given by so called Bloch coefficients Cg^ 
and elongations of the wave vector 
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Evaluation of the double differential scattering cross- 
section involves calculation of the following sum [see, e.g., 
Ref.El Eq. (24)] 
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Here C g j) and are the Bloch coefficients for the 

incoming and outgoing Bloch fields. The quantity 
SuCq^g) ^ g ^ e mixed- dynamic form-factor (MDFF) di- 
vided by squares of momentum transfer vectors q, q' 
(Coulomb potential factors). Tjij>i>(t) is a thickness func- 
tion, which depends on Bloch wave indices and experi- 
mental geometry. N u is a number of atoms in the unit 
cell, where u is a base vector. Momentum transfer vec- 
tors actually depend on several indices, but to simplify 
the notation we will not write this dependence explicity. 
For evaluation of MDFFs we typically use an approxima- 
tion 
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which neglects elongations of the wave vectors 7^'^. 



III. SUMMATION ALGORITHMS 

Here we will discuss two known algorithms used in 
Bloch wave calculations and propose new algorithms for 
improved convergence and efficiency of summation. 

A. Manual selection of beams 

The simplest one is based on a choice of beams "by 
hand" let's say on a base of visible spots in diffraction 
pattern, or all beams from a set of integer indices be- 
low certain cut-off, or beams on a systematic row, etc. 
For example, irP^we used a systematic row approxima- 
tion both for secular equation and summation, i.e., we 
picked only a set of ~ 10 beams along the systematic 
row of reflections. Pragmatic hand-selection of beams is 
widespread in literature, see for example RefsJ^H^U. 



B. Excitation error and extinction distance based 
selection of beams 

In our previous work 5 we suggested to choose beams 
on the base of their excitation error s s and extinction 



distances £ g . Their product forms a dimensionless vari- 
able Wg, for which we set a cut-off criterion. This method 
picks beams that follow the Ewald sphere and provides 
automatically a more economic and accurate description. 
A variant of this method uses two different cut-offs for the 
Wg - one for the secular equation (hundreds or thousands 
of beams) and one for the summation (10-15 beams). 
We can safely take a rather large set of beams in the first 
step - the solution of the secular equation - and then for 
the summation we pick only a subset of the beams and 
Bloch waves. The selection of the subset of Bloch waves 
is based on their norm within the subspace of beams se- 
lected for summation^. This has been used in our more 
recent publications, e.g.j 15 | 16 | 2 -^H 

Both this approach and the manual selection of beams 
identify a set of beams and Bloch waves and then they 
count all the cross-terms. I.e., if we have N s beams 
and Nj Bloch waves, the summation runs over N^Nj 
terms, which can be a huge number already for only 10 
g-vectors and 10 Bloch waves. A detailed inspection of 
these terms shows, that majority of them are actually 
negligible. That means, that we sum a lot of tiny terms, 
which is not only inefficient but also contributes to the 
propagation of machine rounding off errors. 



le+09 
le+08 
le+07 
le+06 
le+05 
10000 
1000 
LOO 
10 

1 




■ 22/20 g, 7/9 BW 

□ 12/14 g, 9/14 BW 

□ 12/14 g, 5/9 BW 




-25 



-20 



-15 -10 
Log 10 IYI 



FIG. 1. Histogram of the distribution of Y^J h , terms mag- 
nitudes as a function of number of g-vectors and Bloch waves 
(BW). 

To illustrate the finding, we have set-up test calcu- 
lations in a 3-beam orientation with G = (200) and 
detector orientation (^, y). By setting the maximum 
Wg = 10 5 for the secular equation, the program identified 
a list of 628 and 627 g-vectors for incoming and outgoing 
beams, respectively. For the summation we set the maxi- 
mum w s = 500, which filtered the list down to 12 and 14 
g-vectors. These contained [000], ±[200], ±[110], ±[110], 
[310], [310], ±[400] and [020]. The outgoing beam in- 
cluded on top of that [020] and [510]. With the criterion 
of minimal norm of the Bloch wave in the subspace of 
identified beams 0.01 we got 5 and 9 Bloch waves, while 
for criterion 0.005 we got 9 and 14 Bloch waves for in- 
coming and outgoing beam, respectively. In total that 
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Bloch waves, for which \C^\ < P m i n or \Dq } \ < ^min- 
The resulting array is sorted according to decreasing 
magnitude using QuickSort algorithm. In total, this step 
has computational complexity 0(N^ N2 log AT 2 ), where 
N 2 is length of the list of diads. 

The next step is creation of a list of quadruple prod- 
ucts \C^ ) c!g ) D$ ) D { £\ > P min . This operation is 0(iV 4 ), 
where N4 is the number of the quadruples larger than 
Pmin, because the lists of diads were sorted by magni- 
tude. The maximum number of failed comparisons is 
0(^2), where N2 is the length of the longer of the two 
lists of diads formed in previous step. 

The list of quadruples is again sorted with QuickSort 
algorithm, but this time first by the q-vector given by 
h — g and then by magnitude. This operation costs 
0(A^ log A^) operations. Now we have prepared data for 
the final step, which is an identification and output of oc- 
tuple products larger than P min and, simultaneously, out- 
put of the q, q' diads for the MDFF calculation. Thanks 
to the way, how we sorted the list of quadruples, we can 
serially process the list and output q, q' and correspond- 
ing octuples without actually holding the array of oc- 
tuples in memory. The number of operations is 0(N$), 
where N$ is the number of octuples larger than P m i n . 
Maximum number of failed comparisons is given by the 
number of q, q' diads, which is well below Ng. 

As a graphical summary of the main steps of the al- 
gorithm, a schematic flowchart diagram has been plotted 
in Fig. [2] 

Memory requirements are very favorable. Most often, 
the largest arrays are the N s x N s matrices used in the 
secular equation. The lists of quadruples rarely reach 
this lengths, only perhaps for extremely small P m i n of 
the order below 10 -6 . The octuples are never held in 
memory, since the array is created and output serially 
based on the favorable sorting of the list of quadruples. 

As will be shown below, Ng can be orders of magnitude 
below the N^N^ even if naturally it has to be propor- 
tional to that. Its value strongly depends on P m i n and 
it turns out that there is an approximately inverse pro- 
portionality between, i.e., N% ex P~ m - Tuning the P m i n 
allows to find a suitable compromise between the speed 
and accuracy, which will be discussed below. Also note 
that we are selecting largest terms from a large set of 
beams, of the order of 1000, which is not feasible to treat 
by above-mentioned methods. Therefore it can happen 
that stricter selections of beams + including all cross- 
terms can lead to less accurate calculations, because some 
singular contributions of considerable magnitude beams 
can be missed. Examples of such situations will be shown 
below. 
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FIG. 2. Flowchart of the algorithm for automatic selection of 
dominant terms. 



makes above 57 million vs 354 million terms, per energy 
and thickness. The computing time of one thickness pro- 
file on a single 2.0 GHz Intel Pentium 4 Xeon CPU was 
45s vs 285s, respectively. 

Instead of increasing number of Bloch waves, we also 
tested the effect of enlarging the set of g- vectors in sum- 
mation by setting w s = 800. The number of selected 
g-vecotrs grows to 22 and 20, and some of the hkl beams 
have nonzero I (higher order Laue zones). Keeping min- 
imal norm for Bloch waves 0.01 we got 7 and 9 Bloch 
waves. In total it gives 768 milions of terms, which were 
summed in 3886s. 

Figure [l] shows histograms of distributions of the sizes 
of all these terms. It turns out that if we want to as- 
sure that all of the, let's say, 1000 dominant terms (for 
our fixed selection of beams) are included in the summa- 
tion, we are also including into the sum many millions of 
terms with much smaller magnitudes, majority of them 
having negligible influence on results. As we will show be- 
low, 1000 dominant terms might not always be enough in 
terms of convergence. Then one can easily conclude that 
requiring a summation over, e.g., 10 5 dominant terms 
would require such a number of beams and Bloch waves 
that we would end up summing billions of terms, greatly 
wasting computational resources. 



C. Automatic selection of dominant terms 

The new methods described below are designed to 1) 
avoid summation of negligible terms, 2) improve the scal- 
ing of the summation. They are built on top of the 
Wg based selection of the beams for diagonalization, as- 
suming that we take a sufficiently large set of beams for 
the secular equation, typically hundreds up to few thou- 
sands. New development is in the algorithm for selection 
of terms to be summed from the large set of beams and 
Bloch waves, where no term is a priori rejected. 

The algorithm is controlled by a single cut-off criterion 
Pm\ n - In the first step we create lists of diads of the Bloch 

> P min , for 



coefficients, \C ( j) C ( s j) \ > P min and \D^D^\ 
incoming and outgoing beam, respectively. Since none of 
is larger than one, we can safely ignore all 



\ci j) \ov\D^ 



D. Automatic term selection with MDFF 
asymptotic 

A modification of the algorithm is possible, if we 
take an advantage of the dipole-type asymptotic of the 
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MDFFs and the l/q 2 q' 2 denominator. For large q- vectors 
the denominator suppresses the terms, therefore we can 
do an even more efficient rejection of the negligible terms. 
In the dipole approximation MDFF is proportional to 

5(q, q', E) cc q • N(E) ■ q' + (q x q') • M(E) (5) 

where TV is a energy-resolved tensor dependent on density 
of states, local anisotropics and spin-orbit coupling and 
M is an energy-resolved vector function of local magnetic 
properties^. Ignoring the energy dependence, the asymp- 
totic behavior of MDFF as a function of q, q' vectors 
is #(q, q') oc qq' . Combining this with the denomina- 
tor l/q 2 q' 2 accompanying every MDFF, we obtain 1/qq' 
asymptotic behavior of terms. 

In order to keep a dimensionless variable for the cut- 
off criterion, we attach to the lists of quadruples a factor 
<W%h, where q = k/ - k* and q gh = q + h - g. A 
small complication arises from this choice, since it is pos- 
sible that the ratio qo/q s h is larger than one for some 
larger scattering angles or energy losses. For such even- 
tualities we need to make sure that our lists of diads 
have a 'reserve', i.e., the cut-off for the list of diads needs 
to be reduced. We have implemented a cut-off P m i n /10 
for the list of diads, which is an arbitrary choice, yet 
it turned out to be both safe and not too costly, when 
compared to other list operations in the algorithm. The 
rest of algorithm is unchanged, only when outputting the 
list of selected octuples, we remove the asymptotic factor 

<?o/4gh<2g'h'. 

In the rest of the article, we use this modified auto- 
matic term selection algorithm (MATS). 



IV. RESULTS 

In this section we compare the various methods of per- 
forming the Bloch waves summation and demonstrate 
some of new possibilities offered by the MATS. 



A. Weakly excited spots 

In systematic row geometries, many simulations have 
been performed by only choosing beams along the sys- 
tematic row. However, depending on the tilt of the beam, 
some of the spots outside the systematic row can be 
weakly excited. That can be easily missed in the system- 
atic row approximation. Here we will show a simulation 
of a three-beam orientation for bcc iron with systematic 
row index G = (200). We will consider a beam tilt of ap- 
proximately 5 degrees from (001) zone axis orientation, 
which corresponds approximately to incoming beam ori- 
entation (0, 1, 10). 

Let's compare precision and timing of a systematic row 
approximation (SRA), K; g -based beam selection (WGBS) 
and the MATS. In the SRA we included beams up to 
d=4G, in total 9 beams and the summation was performed 



SR, 9 beams WGBS, 300/0.01 MATS, 0.01 MATS, 0.0001 
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FIG. 3. Diffraction pattern of iron in 3-beam orientation with 
G = (200). Top row corresponds to 30nm sample thickness, 
bottom row is at lOOnm. From left to right various methods 
of calculation have been used, namely systematic row (SR) 
approximation, u> g -based beam selection (WGBS) and auto- 
matic term selection with MDFF asymptotics (MATS) with 
two different cut-offs (see text for details). Intensities are on 
a logarithmic scale. 



over all 9 resulting Bloch waves. In the WGBS, for the 
secular equation we set the w s cut-off to 100000, which 
resulted in approximately 630 beams. For the summa- 
tion, we used cut-off 300, which was fulfilled by 8-14 
beams. For the Bloch waves we required the subspace 
norm to be larger than 0.01, which was fulfilled by 4-27 
Bloch waves. Finally, in the MATS we tested for summa- 
tion the following two cut-off criteria: P m i n = 0.01 and 
0.0001. 

The resulting diffraction patterns are summarized in 
Fig-H 

Regarding the computing costs, the fastest was the 
MATS simulation with P m i n = 0.01, which finished in 
43 CPU hours. In this calculation, larger part of the 
time was spent in diagonalization of the secular matrices 
of dimension 630. MATS calculation with P min = 0.0001 
finished in 98 CPU hours. The SRA calculation took 
considerably more time, despite being much less accu- 
rate: the 9-beam calculation finished in 561 CPU hours. 
Note that here we are diagonalizing very small matri- 
ces, only 9-by-9, therefore practically all the computing 
time is spent in summation. Finally, the WGBS calcu- 
lation required 2800 CPU hours, almost 70-times more 
than MATS with P min = 0.01, yet being of lower ac- 
curacy. While in our SRA calculation we always sum 
9 8 = 43 x 10 6 terms, in MATS with P min = 0.0001 it 
was on average only 70 x 10 3 , in maximum reaching half 
million. 



B. EMCD and rriL/ms maps of iron 

Electron magnetic circular dichroism (EMCD) is a re- 
cently developed experimental technique^, which uses 
ELNES to extract the atom-specific magnetic character- 
istics, such as spin and orbital moments. For quantitative 
analyses it used sum rule d 25 * 2 ^ , using which one can ex- 
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TABLE I. Average lengths of double (iV 2 ), quadruple (iV 4 ) 
and octuple (iVs) product lists, average number of momentum 
transfer diads per energy step (iV qq /) and computing times 
(total and per-pixel average) for maps in Fig. [4] as a function 
of convergence parameter P m in- Times refer to a single Intel 
Pentium 4 Xeon processor at 2.5GHz. 
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FIG. 4. Diffraction patterns (top row), relative up-down dif- 
ference maps (middle row) and apparent rriL/ms ratio maps 
(bottom row) a function of convergence parameter P mm . Cal- 
culations were performed for 20nm layer of bcc iron in two- 
beam geometry with G = (200), at 300keV. 



tract the ratio of orbital and spin moment of the studied 
atonPHl. These properties are highly sensitive functions 
of the edge-dependent ELNES spectra. Similarly as in 
experiment, also in simulations there is a high demand 
for precision. Below we will show the performance of 
MATS for the similar setup as above, two-beam case with 
G = (200) and beam tilt of 10 degrees, showing diffrac- 
tion pattern, distribution of the magnetic signal and the 
map of the mi / m s ratio as a function of the cut-off vari- 
able -P m in- 

In more detail, the distribution of the magnetic signal 
is obtained as a difference of the diffraction pattern and 
its mirror image with respect to the systematic row mir- 
ror axis. In the figure, it is shown as a relative quantity, 
that means it is divided by the sum of the diffraction 
pattern and its mirror image. In other words, it is an 
antisymmetric part of the diffraction pattern divided by 
its symmetric part. Here we will not enter the discus- 
sions about the continuum signal extraction or post-edge 
normalization^, since our main focus is the convergence 
of the Bloch waves calculation. 

The maps of the m^jms ratio are evaluated pixel- by- 
pixel from the magnetic signals Ml 3 and M^ 2 integrated 
over Ls an d L2 edge regions, respectively, by the follow- 
ing formulsP^U 



m s 



Mr 



3 M L3 - 2M L2 



(6) 



Intuitively, this should be a constant function throughout 
the diffraction plane. However, in fact, large variations 
occur due to asymmetries discussed in detail in 28 . The 
map of the m^jms ratio is a highly sensitive function of 
the scattering cross-section and is an excellent test of the 
convergence properties of the newly developed method. 

We have used the same w g cut-off as in previous sub- 
section, but we have varied the P m in from 10 -2 down 
to 10 -6 and recorded some statistic information about 
the number of terms included in the summation, Table [l[ 



Note that, as anticipated, the number of summed terms is 
inversely proportional to P m i n . Importantly, even at the 
most accurate calculation, the number of summed terms 
stays many orders of magnitude below 700 8 , demonstrat- 
ing the high efficiency of the selection of terms. 

Note that the diffaction pattern appears to be rea- 
sonably converged already for P m i n = 10 -2 . An atten- 
tive reader might spot slightly fuzziness, but neverthe- 
less, the differences between all five diffraction patterns 
are visually negligible. The relative difference map re- 
quires better convergence, at least P m i n = 10 -3 , or better 
-Pmin = 10 -4 . Note the numerical noise at P m i n = 10 -3 , 
particularly along the vertical line going through the 
transmitted beam. Although a slight fuzziness remains 
at P m i n = 10 -3 , but the accuracy is already satisfac- 
tory. The most sensitive quantity presented here is the 
apparent m^jms ratio. Since the orbital momentum 
in iron is small, the nominator in the sum rule expres- 
sion is a difference of two (typically small) differences of 
spectra, integrated over L2 or L3 edge, respectively. It 
requires high accuracy to obtain well converged maps. 
Calculation with P m i n = 10 -4 seems to produce reason- 
ably converged results. The only visible difference at 
Pmin = 10 -5 , when compared to P m in = 10 -4 , appears 
around (—1, ±1)G and (2, ±2)G positions. The results of 
calculation with P m i n = 10 -6 are visually indistinguish- 
able from P m i n = 10 -5 . 

Looking back into the Table [TJ we see that for most 
of our purposes P m i n = 10 -4 provides highly converged 
results at average costs 3-times lower than the simplest 



testing calculation discussed in Sec. |IIIB| Nevertheless, 
this criterion can depend on the studied crystal struc- 
ture and orientation and always should be tested, when 
performing simulations of new systems. 



C. Tilting crystal from the zone axis orientation 

High efficiency of the MATS method allows us to ap- 
proach experimental geometries, which require a large 
number of beams for a converged simulation. Particu- 
larly, the zone axis orientation is highly computationally 
intensive. Here we will follow the development of the 
diffraction pattern, when tilting from an exact zone axis 
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FIG. 5. Evolution of the diffraction pattern and of the map of 
magnetic signal when tilting the incoming beam from (001) 
zone-axis orientation to 3-beam orientation (016). Left half- 
plane of zone axis orientation is calculated with 25-beams, 
right half-plane of 3-beam orientation is calculated with 11 
beams on the systematic row (see text for details). Note the 
differences from MATS calculations. 



towards the 3-beam case with G = (200) in small steps of 
the tilt. Extremely rich patterns are observed, particu- 
larly when looking at the distribution of the magnetic sig- 
nal, with formation of a dense network of Kikuchi bands 
and lines. 

The Wg cut-off was the same as in previous sections, the 
P m i n was set to 10 -4 , leading to well converged diffrac- 
tion patterns and maps of the magnetic signal, as demon- 
strated in Sec. UVBl We chose to demonstrate the results 
at a sample thickness 50nm, because at this thickness the 
Kikuchi patterns appear sharp enough, yet not so sharp 
that we would observe aliasing artefacts due to discrete 
grid of pixels. 

The incoming beam is gradually tilted from an exact 
zone axis (001) direction towards the 3-beam orientation 
via (0,1,40) ~ 1.43 degrees, (0,1,20) ~ 2.86 degrees, 
(0, 1, 10) ~ 5.71 degrees and (016) ~ 9.46 degrees tilt. At 
the endpoints we also did calculations with manual selec- 
tion of beams. For the exact zone axis case we included 
kinematically allowed g- vectors from the zero-order Laue 
zone (ZOLZ) with hkl indices less than 4 (in total 25 g- 
vectors) and for the final 3-beam orientation we included 
beams of up to ±5G (in total 11 g- vectors). 

The results are summarized in Fig. [5] At the zone axis 
orientation we see a rich pattern, consisting of Bragg 
spots and multitude of Kikuchi bands and lines. The 
Kikuchi pattern is even better visible on the map of the 
magnetic signal. Number of thin lines with varying inten- 
sity and sign form a rich symmetric structure. Now let's 
compare them to a calculation with 25 g-vectors from 
ZOLZ. The diffraction pattern seems to display the same 
structure, except for higher intensity. A cautious eye 
would spot some differences in relative intensity around 
{220} spots. More differences can be seen in the maps 
of the magnetic signal. Particularly at larger scattering 
angles, the pattern of Kikuchi lines is more rich in the 
MATS description. 

As we tilt the beam, the pattern of beams is deform- 
ing, following the movement of the zone axis spot down. 
At beam tilt 1.43 degrees, corresponding to approxi- 



mately 25 mrad, the zone axis spot moved down by a 
bit less than 2G. The two- fold Bragg angle correspond- 
ing to G = (200) in bcc iron is 13.8 mrad, therefore the 
2G « 27.6 mrad, which agrees with the position of the 
zone axis spot in the map. The dominant Kikuchi bands 
are passing around this spot. In the map of the magnetic 
signal a noise-like signal forms around the zone axis cen- 
ter, and at the same time, there evolves a different dom- 
inant sign in the four quadrants of the diffraction plane. 
Both these trends continue with the tilt 2.86 degrees. 
The zone axis spot has weaker intensity, moves further 
down, to around 3.5G under the transmitted beam. At 
this angle and thickness, we see relatively strongly ex- 
cited beams for h = —3, —1, 1,3 and k = —1, i.e., spots 
under the systematic row of reflections. At 5.7 degrees 
tilt the zone axis spot is outside the vertical range of our 
maps, we see a clean 3-beam pattern, with some inten- 
sities at (110) and (110). Maps of the magnetic signal 
are considerably simpler, having a dominant sign in each 
quadrant. The four vertical lines represent Kikuchi lines. 
Skew lines at larger scattering angles show unexpectedly 
high magnetic signal, which might be a finding of poten- 
tial practical importance. 

At the final step, almost 10 degrees tilt the patterns 
complete the trends: three strongly excited spots along 
the systematic row, and three dominant Kikuchi bands. 
Magnetic signal showing the four vertical lines with signs 
corresponding to their quadrants. Interesting is a com- 
parison to a systematic row approximation, where we 
used only 11 g-vectors - multiples of the G = (200). 
These patterns completely miss the skew Kikuchi bands 
and in the maps of the magnetic signal there is a very 
reduced pattern of lines. 



D. Comparison between MATS and ICSC results 

A similar but alternative computer program which cal- 
culates inner-shell ionization and backscattering cross 
sections for fast electrons incident on a crystal is avail- 
able, known as the ICSC code, developed by Oxley and 
Allen 30 . The program calculates the inelastic scattering 
coefficients for inner-shell ionization, pertinent to EELS 
and energy dispersive X-ray (EDX) analysis, using pa- 
rameterizations of the atomic inelastic scattering fac- 
tors. The program treats the dynamical scattering in 
a very similar way to MATS, but simplifies the calcu- 
lations of dynamical form factors, using the approxima- 
tion where the integration over all the final states of the 
scattered electron is replaced by an analytic expression, 
which unfortunately does not cover MDFF calculation. 
This means that the ICSC code only allows the EELS 
detector geometry concentric with the incident beam di- 
rection. The ICSC code has been intensively tested for 
EDX results of atom location by channeling enhanced 
microanalysis (ALCHEMI), but little was published in 
testing the ICSC program for experimental EELS. It is 
thus a good chance for comparing between the two meth- 
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FIG. 6. Comparison of ICSC and MATS. Thickness profiles of 
K-edge of Si in SiC in systematic row orientation. Horizontal 
axis corresponds to a beam tilt in multiples of G = (220) and 
the vertical axis is thickness in nm. a) ICSC calculation, b) 
calculation with the new code, adopting the same parameters 
as ICSC, c) MATS calculation. 



ods. 

As a benchmark we selected a cubic SiC crystal and 
the orientation dependent Si K-edge cross sections were 
compared. We have done two sets of calculations: 1) 
thickness dependence in a beam-rocking experiment in a 
systematic row orientation, and, 2) 2-dimensional beam- 
rocking in a (001) zone- axis orientation. In both cases, 
we selected parallel illumination and convergence angle 
of 10 mrad. The acceleration voltage was set to 300 keV. 
ICSC uses Doyle- Turner scattering factor therefore 
for the sake of consistency, we used these also in calcula- 
tions with the new code. 

In the systematic row calculation, we assumed a beam 
tilt of approximately 10 degrees towards the Gp20) sys- 
tematic row conditions, i.e., the zone axis was (118). For 
ICSC we used as set of 85 beams (hkl) _L (118). With 
the new code we did two sets of calculations - one with 
the fixed set of 9 beams from the systematic row (SR; up 
to ±4G(220)) an d another in the MATS approach with 
convergence parameter 10 -4 . Results are summarized 
in Fig. [6j The results show good qualitative agreement, 
yet there are clear differences in details, even when com- 
paring the ICSC calculation to a SR calculation. That 
indicates that approximations introduced in ICSC par- 
tially neglect some details of the dynamical diffraction 
process. Full MATS calculation shows expectedly even 
more of a fine structure, especially at larger thicknesses. 
Detailed comparison to experiment would be needed to 
verify the fine features observed in MATS calculation. 

The dependence of the double-differential scattering 
cross-section on beam rocking from the (001) zone axis 
orientation is shown in Fig. [7] The ICSC used results are 
shown in the left column. A set of 197 beams was used, all 
from the zero-order Laue zone. The same ones were used 
for comparison with the new code (shown in the middle 
column). Finally, results obtained by the MATS method 
are shown in the right column. General qualitative fea- 
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FIG. 7. Comparison of ICSC and MATS. Intensity of K-edge 
of Si in SiC in (001) zone- axis orientation as a function of 
beam rocking. Axes corresponds to a beam tilt in multiples 
of G = (200). a) ICSC calculation at 100 nm, b) calculation 
with the new code, adopting the same parameters as ICSC, 
c) MATS calculation at 100 nm, d)-f) the same as a)-c), but 
at 200 nm. 

tures are in good agreement between all three computa- 
tional approaches, nevertheless, as in the systematic-row 
orientation, also here we observe notable differences. Of 
particular interest is a soft breaking of the four-fold sym- 
metry in the case of MATS calculations. It is easier to 
spot at 100 nm, where the central intensity maximum has 
a slighly prolate shape along the main diagonal of the pat- 
tern. This can be explained by presence of beams from 
higher-order Laue zones in the secular equation. They 
introduce (hkl) beams with I ^ and due to the curva- 
ture of the Ewald sphere, (hkl) and (hkl) have different 
excitation errors and thus their corresponding Bloch co- 
efficients differ. In combination with the four-fold screw 
axis of this structure and a lack of inversion symmetry, 
the resulting beam-rocking pattern shows deviation from 
four-fold symmetry. 

V. CONCLUSIONS 

We have developed a new method for accurate sum- 
mation over Bloch waves and their plane wave compo- 
nents (beams) named modified automatic term selection 
(MATS). The complexity of MATS scales inversely pro- 
portionally with the cut-off for the term sizes. It allows 
highly accurate calculations at much lower computational 
costs compared to previous methods. We have demon- 
strated advantages of the method on capturing the inten- 
sity of the weakly excited spots outside the systematic 
row. The convergence properties of MATS were studied 
on a two-beam case simulation with focus on faint ef- 
fects observed in EMCD experiments. A rich pattern of 
Kikuchi bands and lines was presented in a simulation of 
a tilting of the bcc crystal from the zone axis orientation 
to a three-beam orientation. We have also compared the 
new method to ICSC code and both display qualitatively 
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the same results. In more detail, MATS calculation seems 
to provide more rich structures, most likely due to larger 
number of beams included in the secular equation. 
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